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Abstract This paper presents a novel approach to modeling curiosity in a 
mobile robot, which is useful for monitoring and adaptive data collection 
tasks, especially in the context of long term autonomous missions where pre¬ 
programmed missions are likely to have limited utility. We use a realtime topic 
modeling technique to build a semantic perception model of the environment, 
using which, we plan a path through the locations in the world with high se¬ 
mantic information content. The life-long learning behavior of the proposed 
perception model makes it suitable for long-term exploration missions. We val¬ 
idate the approach using simulated exploration experiments using aerial and 
underwater data, and demonstrate an implementation on the Aqua underwa¬ 
ter robot in a variety of scenarios. We find that the proposed exploration paths 
that are biased towards locations with high topic perplexity, produce better 
terrain models with high discriminative power. Moreover, we show that the 
proposed algorithm implemented on Aqua robot is able to do tasks such as 
coral reef inspection, diver following, and sea floor exploration, without any 
prior training or preparation. 


Keywords Autonomous Exploration • Topic Modeling • Marine Robotics • 
Long-Term Autonomy 


Yogesh Girdhar 

Applied Ocean Physics & Engineering, Woods Hole Oceanographic Institution, Woods Hole, 
MA 02543, USA. 

E-mail: yogi@whoi.edu 

Gregory Dudek 

School of Gomputer Science 

McGill University, Montreal, QG H3A0E9, Ganada. 

E-mail: dudek@cim.mcgill.ca 




2 


Yogesh Girdhar, Gregory Dudek 


1 Introduction 

Gaining knowledge about our environment is a never-ending quest for human¬ 
ity. Direct exploration by humans although tempting, puts strong limitations 
on what can be explored due to the physical limitations of the human body. 
Fortunately, through the use of robotics, we can continue this tradition of 
exploration without putting human lives at risk. 

Use of autonomous robots is essential for space and ocean exploration, 
where there are strong communication bottlenecks that do not allow direct 
remote control of the vehicles [1]. However, such exploration missions, which 
are inherently long-term, necessitate autonomy beyond low level navigational 
control. To maximize the utility of a mission in terms of information content 
of the collected data, there is a need for high level understanding of the en¬ 
vironment in real time, which can then be used to adaptively plan the robot 
path. 

A common approach for autonomous collection of environment data is 
to use space filling paths through the environment. This approach, although 
simple, is however not ideal. The amount of information collected that is asso¬ 
ciated with the different spatial phenomena, is proportional to the spatial area 
covered by them. Underwater, this might mean that most of the collected data 
only contains uninteresting observations of sand or rocks, and very occasion¬ 
ally we might have a few samples with something interesting such as thermal 
vents, marine life, or archeological sites. A better strategy for collecting data 
is to have the robot behave like an explorer, or a vacationing tourist, moving 
swiftly over regions with familiar sights while paying much more attention, 
i.e., collecting more data when something novel or interesting is in view. In 
this paper we describe such a techniques, and demonstrate its functioning on 
an underwater robot. 

Our approach to identifying what is interesting is to first learn a generative 
visual model of the environment. Then, given this visual model we quantify 
the interestingness of an observed image sample by computing its perplexity 
score, i.e., how much uncertainty does the model have in describing what it 
has observed. We use realtime online topic modeling (ROST) [9], to learn 
a constantly evolving visual model of the environment. ROST models the 
underlying cause of the observations made by the robot with a latent variable 
(called topic), which is representative of different kinds of terrains or other 
visual constructs in the scene. Topic modeling techniques have been shown 
to produce semantic labeling of text [4] and images [5], including satellite 
maps m- 

At each time step, we add the observations from the current location to 
the topic model, and compute the perplexity of the observations from the 
neighboring observable locations. This perplexity score, along with a repulsive 
potential from previously visited locations, is then used to bias the probability 
of next step in the path. Since observations with high perplexity have high 
information gain, we claim that this approach would results in faster learning 
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Fig. 1 An illustrated example of a scenario that demonstrates the proposed exploration 
strategy. The robot explores the environment while building an unsupervised topic model 
of its experience, which it then uses to find surprising and novel observations (highlighted 
by red circle). Topic correspond to various semantic scene constructs such as sand, rocks, 
coral and fish. Robot biases its exploration path in the direction of these surprising or novel 
observations to collect more data about them. Given the distribution of topics observed thus 
far (shown in the plot on the right), the most interesting observation in robot’s view is the 
long fish. 


of the terrain topic model, which would imply shorter exploration paths for 
the same accuracy in predicting terrain labels for unseen regions. 

Figure(top) shows example of such an exploration path overlaid on top of 
an aerial view map. We see that the exploration path, which starts with blue, 
and ends in red, has in the beginning no preference over what is interesting, 
and hence is somewhat straight due to the repulsive potential from previously 
visited location. However after some time, in the cyan region of the path, it 
encounters a trail that is a rare observation which it follows till the end. The 
bottom image in the figure is the labeling of every location in the map using 
the topic model that was learned online. 

The main contribution of this work is in demonstrating that first, robots 
can use online topic modeling to learn a visual model of their environment 
with no supervision; second, by using this topic model they can identify inter¬ 
esting, information rich locations; and finally, do a stochastic gradient ascent 
in semantic information space to explore the environment, collecting data that 
improves the topic modeling, resulting in better discriminatory power. 


2 Previous Work 


In the following sections we briefly look at some common variants of the ex¬ 
ploration problem. 
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Fig. 2 Example of an exploratory path (top) produced by the proposed technique on a 
satellite map. The path begins in Blue, and ends in Red. Output of this exploration is a 
terrain model, which when applied to the observation from entire map produces terrain 
(topic) label for every location(bottom). Different colors represent different terrains (topic 
labels). 
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2.1 Coverage of Known Environments 

If we have prior knowledge about the world then perhaps the simplest form of 
exploration is coverage, where the goal is to make the robot pass through every 
point in the given spatial region of interest. If the space is free of obstacles, 
then we can simply use a zig-zag path, sometimes known as a boustrophedon 
path to cover the world. In the case of known obstacles, Choset et al. [6] 
proposed boustrophedon cell decomposition of the world such that each cell can 
be covered by a simple boustrophedon path; then, given this decomposition, 
a path can be planned through all the cells. This would result in complete 
coverage. 

Mannadiar and Rekleitis m later proposed splitting some boustrophedon 
cells so that the robot does not need to move over previously covered cells, 
resulting in paths guaranteeing optimal coverage. These paths have been ex¬ 
tended for use with the general class of non-holonomic robots, such as aerial 
vehicles [22]. 


2.2 Exploration for Improving Navigation 

Navigating a robot through free space is a fundamental problem in robotics. 
Yamauchi [23] defined exploration as the “act of moving through an unknown 
environment while building a map that can be used for subsequent navigation”. 
Yamauchi’s proposed solution involved moving the robot towards the frontier 
regions in the map, which were described as the boundary between known free 
space and the uncharted territories. 

If we have an inverse sensor model of the range sensor, it is possible to 
compute locations in the world which would maximize the utility of the sensor 
reading in resolving the obstacle position and shape. Grabowski m proposed 
such an exploration strategy in which the goal is to maximize the understand¬ 
ing of obstacles rather than the exposure to free space. In this approach, the 
robot identifies the location with the next best view, where a sonar sensor 
reading would have the greatest utility in improving the quality of the repre¬ 
sentation of an obstacle. 

If there is no external localizer available to the robot, then it is desirable 
that the robot explores, maps, and localizes in the environment at the same 
time. Sim, Dudek and Roy m take the approach of finding trajectories at each 
step that explore new regions while minimizing the localization uncertainty of 
the robot as it re-enters a previously mapped region. 

Bourgault et al. [5] and Stachniss et al. [21] have proposed an exploration 
strategy which uses gradient ascent to move the robot towards areas of high 
entropy which would maximize map information gain, while still keeping the 
robot localized. 

Kollar et al. [II US] formulated the exploration problem as a constrained 
optimization problem, where the goal is to find a path that maximizes map 
accuracy with the constraint of complete map coverage. To do this, the algo- 
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rithm first identifies the locations on the map that are essential for coverage, 
and then uses these locations to constrain the trajectory that maximizes map 
accuracy. 


2.3 Exploration for Monitoring Spatiotemporal Phenomena 

In underwater and aerial environments, obstacle avoidance and map building 
tasks are typically not of primary concern. 

Binney et al. [3] have described an exploration technique to optimize mon¬ 
itoring spatiotemporal phenomena by taking advantage of the submodularity 
of the objective function. Bender et al. [2] has proposed a Gaussian process 
based exploration technique for benthic environments, which uses an exper¬ 
iment specific utility function. Das et al. [7] have presented techniques to 
autonomously observe oceanographic features in the open ocean. Bollinger et 
al. [12] have studied the problem of autonomously studying underwater ship 
hulls by maximizing the accuracy of the sonar data stream. Smith et al. [20] 
have looked at computing robot trajectories which maximize the information 
gained, while minimizing the deviation from the planned path. 


2.4 Exploration using Topic Modeling 

In our previous work [9] we used spatiotemporal topic modeling to describe 
the scene observed by a robot using topic distributions, which acts as a high 
level scene descriptor that is immune to low level scene changes. We used 
these descriptors to define an online summary, consisting of a small set of 
images that are representative of the diversity of the images observed by the 
robot thus far, and then use these summary images to compute the novelty 
or surprise score of a newly observed image. This surprise score was used to 
control the speed of the robot of a pre-defined trajectory. 

The work that we present in this paper improves upon our prior work in 
many different ways. Eirst, instead of computing novelty of the entire image, 
we compute the surprise score for different sections of the incoming image ob¬ 
servation, which gives us the capability to automatically compute information 
rich exploration trajectories, and not just control the speed. Second, we use 
model perplexity to compute the surprise score, instead of the summary based 
surprise score. Perplexity scores are better suited as surprise score because 
they have a natural meaning in terms of information gain and uncertainty, 
and are free of parameters such as summary size. Einally, this work consists 
of extensive quantitative evaluation of the proposed exploration strategy, and 
compares it to other exploration strategies. 



Title Suppressed Due to Excessive Length 


7 


3 Topic Modeling of Observation Data 

In this section we will briefly describe topic modeling process used by ROST 
[9] , which we use to give high level labels to the low level features observed by 
the robot, and also to compute the perplexity score of the observations. 


3.1 Generative Model 


An observation word is a discrete observation made by a robot. Given the 
observation words and their location, we would like to compute the posterior 
distribution of topics at this location. Let w be the observed word at location 
X. We assume the following generative process for the observation words: 

1 . word distribution for each topic k: 

~ Dirichlet(/3), 

2 . topic distribution for words at location x : 

Ox ^ Dirichlet((a + iL(x)), 


3. topic label for w: 

4. word label: 


Z Discrete(^a,), 
w ^ Discrete( 0 ; 2 ), 


where y ^ Y implies that random variable y is sampled from distribution T, 
z is the topic label for the word observation re, and H{x) is the distribution of 
topics in the neighborhood of location x. Each topic is modeled by distribution 
(/)/c over V possible word in the observation vocabulary. 


(j)k{v) = P(re = v\z = k) (xnl^ f3, (1) 

where is the number of times we have observed word v taking topic label 
/c, and P is the Dirichlet prior hyperparameter. Topic model ^ = {pk} is a 
K xV matrix that encodes the global topic description information shared by 
all locations. 

The main difference between this generative process and the generative 
process of words in a text document as proposed by LDA IIHB is in step 
2 . The context of words in LDA is modeled by the topic distribution of the 
document, which is independent of other documents in the corpora. We relax 
this assumption and instead propose the context of an observation word to be 
defined by the topic distribution of its spatiotemporal neighborhood. This is 
achieved via the use of a kernel. The posterior topic distribution at location x 
is thus defined as: 

Ox{k) = 'P{z = k\x) oc + a, 


( 2 ) 
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Fig. 3 Each cell shown corresponds to a spatiotemporal bucket containing all the observa¬ 
tion from that region. We refine the topic label for a word Wi in an observation by taking 
into account the spatiotemporal context Gi of the observation. 


where K{’) is the kernel, a is the Dirichlet prior hyperameter and, riy is the 
number of times we observed topic k at location y. 


3.2 Approximating Neighborhoods using Cells 

The generative process defined above models the clustering behavior of ob¬ 
servations from a natural scene well, but is difficult to implement because it 
requires keeping track of the topic distribution at every location in the world. 
This is computationally infeasible for any large dataset. For the special case 
when the kernel is a uniform distribution over a finite region, we can assume 
a cell decomposition of the world, and approximate the topic distribution 
around a location by summing over topic distribution of cells in and around 
the location. 

Let the world be decomposed into C cells, in which each cell c G C is 
connected to its neighboring cells G{c) C C. Let c{x) be the cell that contains 
points X. In this paper we only experiment with a grid decomposition of the 
world in which each cell is connected to its six nearest neighbors, 4 spatial 
and 2 temporal. However, the general ideas presented here are applicable to 
any other topological decomposition of the spacetime. Six neighbors is the 
smallest number which we need to consider while working with streaming 2D 
image data. 

The topic distribution around x can then be approximated using cells as: 

Ox{k) oc I y] nA + a (3) 

\c'eGic{x)) ) 

Due to this approximation, the following properties emerge: 
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Initialize Vz, zi ~ Uniform({l,..., K}) 
while true do 

foreach cell c G C do 

foreach word Wi G c do 

Zi ~ P{zi = k\wi = v,Xi) 

Update 0,^ given the new Zi by updating and Uq 

end 

end 

end 

Algorithm 1: Batch Gibbs sampling 


1. Ox = Oy if c{x) = c(^), i.e., all the points in a cell share the same neighbor¬ 
hood topic distribution. 

2. The topic distribution of the neighborhood is computed by summing over 
the topic distribution of the neighboring cells rather than individual points. 

We take advantage of these properties while doing inference in realtime. 


3.3 Realtime Inference using Gibbs Sampling 


Given a word observation its location and its neighborhood Gi = 
G{c{xi))^ we use a Gibbs sampler to assign a new topic label to the word, 
by sampling from the posterior topic distribution: 


V{zi = k\wi 


= V,Xi) oc 


K,-i + P 


-i + /3) 


( 4 ) 


where counts the number of words of type w in topic /c, excluding the 

current word Uq. _■ is the number of words with topic label k in neigh¬ 
borhood Gi, excluding the current word Wi, and a, P are the Dirichlet hyper¬ 
parameters. Note that for a neighborhood size of 0, the above Gibbs sampler 
is equivalent to the LDA Gibbs sampler proposed by Griffiths et al. m , where 
each cell corresponds to a document. Algorithm shows a simple iterative 
technique to compute the topic labels for the observed words in batch mode. 

In the context of robotics we are interested in the online refinement of ob¬ 
servation data. After each new observation, we only have a constant amount 
of time to do topic label refinement. Hence, any online refinement algorithm 
that has computational complexity which increases with new data, is not use¬ 
ful. Moreover, if we are to use the topic labels of an incoming observation for 
making realtime decisions, then it is essential that the topic labels for the last 
observation converge before the next observation arrives. 

Since the total amount of data collected grows linearly with time, we must 
use a refinement strategy that efficiently handles global (previously observed) 
data and local (recently observed) data. 
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while true do 

Add new observed words to their corresponding cells. 

Initialize Vz G Mt^ Zi ~ Uniform({l,..., K}) 
while no new observation do 
t - P(t|r) 

foreach eell c G Mt do 

foreach word wi ^ c do 

Zi ~ 'P{zi = k\wi = v,Xi) 

Update 0,^ given the new Zi by updating and n^ 

end 

end 

end 

T ^ T+ 1 

end 

Algorithm 2: Realtime Gibbs sampler 


Our general strategy is described by Algorithm At each time step we 
add the new observations to the model, and then randomly pick observation 
times t ^ P(t|T), where T is the current time, for which we resample the topic 
labels and update the topic model. 

We choose P(t|T) such that with probability 77 we refine the last obser¬ 
vation, and with probability (1 — 77 ) we refine a randomly picked previous 
observation. We call 77 the refinement bias of the Gibbs sampler. 


P{t\T) 


T], ift = T 

(1 — 77 )/(T — 1 ), otherwise 


( 5 ) 


4 Curiosity based Exploration 

We assume a cellular decomposition of the world, in which each cell c G C 
is connected to its neighboring cells G(c) C C. The world is composed of 
at most K different kinds of terrains or other high level visual objects (which 
we refer to as topics), each of which, when observed by a robot, can result in 
V different kinds of low level observations, where V » K. Each topic k is 
described by a distribution 0/. over these V different types of observations, and 
for any cell c, (/)g(c) is the distribution of topics in and around the cell. The 
goal then is to plan a continuous path P C G, that allows us to learn the topic 
model ^ = { 0 /c} that best describes the world by labeling each observation at 
each location with a representative topic label. 

At time t, let the robot be in cell pt = c, and let G(c) = {pi} be the set 
of cells in its neighborhood. We would like to compute a weight value for each 
Pi^ such that the probability of the robot taking a step in this direction is 
proportional to this weight. 


P(Pt+i = 9i) oc weight( 5 (j). 


(6) 
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In this work we consider four different weight functions, one that is com¬ 
pletely unaware of of its surrounding, one that is only spatially aware and 
tries to cover the unexplored free space, and two that are both spatially and 
observationally aware. 

1. Random Walk - Each cell in the neighborhood is equally likely to be the 
next step: 


weight = 1. 


( 7 ) 


2. Stochastic Coverage - Use a potential function to repel previously visited 
locations: 


weight (^fi) = 


1 


( 8 ) 


where rij is the number of times we have visited cell Cj, and d{gi, Cj) is the 
Euclidean distance between these two cells. 

3. Word Perplexity - Bias the next step towards cells which have high word 
perplexity: 


weight (^i) 


Wor dPerplexity ( gi ) 


( 9 ) 


4. Topic Perplexity - Bias the next step towards cells which have high topic 
perplexity: 


weight (flj) 


TopicPerplexity ( gi ) 


( 10 ) 


We compute the word perplexity of the words observed in gi by taking the 
inverse geometric mean of the probability of observing the words in the cell, 
given the current topic model and the topic distribution of the path thus far. 


WordPerplexity(^^) = exp 


( EriogEfcPK = ^^|fc)P(fc|D\ 

r ^ T 


( 11 ) 


where W is the number of words observed in gi^ 'P{wi = v\k) is the probability 
of observing word v if its topic label is k, and P{k\P) is the probability of seeing 
topic label k in the path executed by the robot thus far. 

To compute topic perplexity of the words observed in gi^ we first compute 
topic labels Zi for these observed words by sampling them from the distribution 
in Eq.[^ without adding these words to the topic model. These temporary topic 
labels are then used to compute the perplexity of g^ in topic space. 


TopicPerplexity(^i) = exp j . ( 12 ) 

Note that due to presence of repulsive potential from the previously visited 
location, and stochastic nature of how the next step is taken, the robot is 
unlikely to get caught in a local maxima. 








12 


Yogesh Girdhar, Gregory Dudek 




Fig. 4 Example of results of curiosity based exploration on a 2D dataset, (a)-(c) Input image 
used to generate observation data, (d)-(f) Groundtruth labeling, (g)-(i) Terrain labeling of 
the map using the topic model computed on the path. 


5 Experiments 

5.1 Exploration on a 2D Map 

5.1.1 Setup 

To validate our hypothesis that biasing exploration towards high perplex¬ 
ity cells will result in a better terrain topic model of the environment, we 
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Fig. 5 Evaluation of the proposed exploration techniques. The plots show mutual informa¬ 
tion between the maps labeling produced using the topic model computed online during the 
exploration, with maps labeled by batch processing of the data. 
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Fig. 6 Evaluation of the proposed exploration techniques. The plots show mutual informa¬ 
tion between the maps labeling produced using the topic model computed online during the 
exploration, with maps labeled by a human 
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Dataset 

width (px) 

height (px) 

n.cells 

n. words 

Montreall (aerial) 

1024 

1024 

4096 

3,239,631 

Montreal2 (aerial) 

1024 

1024 

4096 

1,675,171 

SouthBellairs (underwater) 

2500 

2500 

6241 

1,664,749 


Table 1 Exploration dataset specifications 


conducted the following experiment. We considered three different maps: two 
aerial views, and one underwater coral reef map. 

We extracted ORB words describing local features, and texton words de¬ 
scribing texture at every pixel (every second pixel for the SouthBellairs un¬ 
derwater dataset). ORB [17] words had a dictionary size of 5000, and texton 
words had a dictionary size of 1000. The dictionary was computed by extract¬ 
ing features from a completely unrelated dataset. 

Each of these maps were decomposed into square cells of width 16 pixels 
(32 for SouthBellairs). Now for each weight function, we computed exploration 
paths of varying length, with 20 different random restart locations for each 
case. Each time step was fixed at 200 milliseconds to allow the topic model 
to converge. We limited the path length to 320 steps, which is about b^J\C\. 
Some basic statistics about the three datasets are given in Table 

Each of these exploration runs returned a topic model which we then 
used to compute topic labels for each pixel in the map in batch mode. Let 
Zp be these topic labels. An example of this labeling for each of the three 
dataset is shown in the last row of Eigurej^ We compared this topic labeling 
with two other labelings: human labeled ground-truth and labels computed 
automatically in batch mode where we assume random access to the entire 
map. 

We then computed the mutual information between Zp and Z^^ Zp and 
Z 5 , and plotted the results as a function of path length, as shown in Eigurej^ 

andini 

5.1.2 Results 

The results are both encouraging and surprising. As shown in Eigure[^anc|^ we 
see that topic perplexity based exploration (shown with blue squares) performs 
consistently better than all other weight functions, when compared against 
ground truth, or the batch results. 

Eor paths of length 80, which is close to the width of the maps, we see that 
mutual information between topic perplexity based exploration and ground 
truth is 1.51, 1.20 and 1.05 times higher respectively for the three datasets, 
compared to the next best performing technique. 

Eor long path lengths (320 steps or more), stochastic coverage (shown with 
orange circles) based exploration matches the mean performance of topic per¬ 
plexity exploration. This is expected because the maps are bounded, and as 
the path length increases, the stochastic coverage algorithm is able to stumble 
across different terrains, even without a guiding function. 
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For short path lengths (40 steps or less), we do not see any statistical 
difference between the performance of different techniques. 

Marked with purple triangles, we see the results of exploration using Brow¬ 
nian random motion. Although this strategy has a probabilistic guarantee of 
asymptotically complete coverage, but it does so at a lower rate that stochastic 
coverage exploration strategy. A random walk in two dimensions is expected 
to travel a distance of ^/n from start, where n is the number of steps. Hence it 
is highly likely that it never visits different terrains. The resulting topic models 
from these paths are hence unable to resolve between these unseen terrains. 

The performance of word perplexity exploration (shown with green dia¬ 
monds) is surprisingly poor in most cases. We hypothesize that this poor per¬ 
formance is due to the algorithm getting pulled towards locations with terrain 
described by a more complex word distribution. This will cause the algorithm 
to stay in these complex terrains, and not explore as much as the other algo¬ 
rithms. In comparison, the topic perplexity exploration is not affected by the 
complexity of the distribution describing the topic, and is only attracted to 
topic rarity. 


5.2 Demonstration: Underwater Exploration 

We implemented the proposed curiosity modeling system on Aqua amphibious 
robot [HI HH] , and tested it in three different underwater scenarios as shown 
in the video located at: http://cim.mcgill.ca/mrl/girdhar/rost/aqua_ 
curiosity .mp4, In this video we see the robot exploring its environment from 
two different points of view. We color the cells in robot’s view with blue, 
and change the opacity based on the perplexity score. A cell marked with 
more opaque blue circle has higher topic perplexity score, and the cell with 
the highest score is marked with a red color. Figure shows some examples 
of these high perplexity regions in observed images by the robot. For all our 
experiments, we fixed the number of topics to A" = 64, and set Dirichlet hyper¬ 
parameters a = 0.1, P = 0.1, refinement bias r] = 0.5, and cell curiosity score 
decay rate of 7 = 0.7. 

5.2.1 Scenario 1: Exploring a coral head 

In this trial, we started the robot near a coral head surrounded by monotonous 
sand. We see that the robot quickly gets attracted towards the coral head, and 
continues to bounce around over this structure while staying away from sand. 
We see the effect of curiosity decay variable 7 , as the robot is successfully able 
to return back to the coral head several times after going over the much less 
interesting sandy regions. 

5.2.2 Scenario 2: Interaction with a diver 

Although our goal was to study the robot as it would interact with a fish, due 
to lack of cooperation with the fish, we were forced to conduct the experiment 
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Fig. 7 Examples of observations showing cells marked with their curiosity score. Red marks 
the cell with the highest score. 
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with a scuba diver instead. We see that as soon as the diver is in robot’s view, 
it is the singular source of curiosity for the robot. We see the robot following 
the diver around, and hovering over the diver when he has stopped moving. 

5.2.3 Scenario 3: Exploring the ocean floor 

In this trial, we started the robot near the ocean floor, which was sparsely 
populated with sea plants and corals. We see the robot manages to keep its 
focus on sea life, while not wasting time over sand. 


6 Summary 

In this paper we have presented a long-term exploration technique that aims to 
learn a observation model of the world by finding paths with high information 
content. The use of a realtime, life-long learning, topic modeling framework 
allows us to describe the incoming streams of low level observation data via 
the use of latent variables representing the terrain type. Given this online, 
life-long learning model, we compute the utility of the potential next steps in 
the path in terms of their perplexity scores. We validated the effectiveness of 
the proposed exploration technique over candidate techniques by computing 
mutual information between the terrain maps generated through the use of 
the learned terrain model, and hand labeled ground truth, on three different 
datasets. 

In our underwater video demonstration, we see that the emergent behavior 
of the robot has a striking similarity to that of biological organisms. While 
the current work on automated exploration was not explicitly bio-inspired, 
the relationship between exploration by living agents and the behavior that 
emerges from this algorithm might be a fruitful direction for further research. 
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